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tract 

Many  learning  systems  must  confront  the 
problem  of  run  time  after  learning  being 
greater  than  run  time  before  learning.  This 
utility  problem  has  been  a  particular  focus 
of  research  in  explanation- based  learning.  In 
past  work  we  have  examined  an  approach  to 
the  utility  problem  that  is  based  on  restrict¬ 
ing  the  expressiveness  of  the  rule  language 
so  as  to  guarantee  polynomial  bounds  on  the 
cost  of  using  learned  rules.  In  this  article 
we  propose  a  new  approach  that  limits  the 
cost  of  learned  rules  without  guaranteeing  an 
a  priori  bound  on  the  match  process  or  re¬ 
stricting  the  expressibility  of  rule  conditions. 
By  making  the  learning  mechanism  sensitive 
to  the  control  knowledge  utilized  during  the 
problem  solving  that  led  to  the  creation  of  the 
new  rule  —  i.e..  by  incorporating  such  control 
knowledge  into  the  explanation  —  the  cost  of 
using  the  learned  rule  becomes  bounded  by 
the  cost  of  the  problem  solving  from  which  it 
was  learned. 


1  Introduction 

The  identification  of  the  utility  problem  in  explana- 
nation-based  learning  (Minton  1988),  has  prompted 
considerable  research  on  how  to  assure  —  or  at  least  to 
improve  the  chances  —  that  learned  knowledge  which 
IS  intended  to  speed  up  system  performance  will  in  fact 
do  so,  rather  than  slow  it  down.  Our  own  efforts  on 
the  utility  problem  have  focused  on  two  subissues  with 
respect  to  Soar,  an  architecture  that  combines  gen¬ 
eral  problem  solving  abilities  with  a  chunking  mech¬ 
anism  that  is  a  variant  of  explanation-based  learn¬ 
ing  (Rosenbloom,  Laird,  Newell,  and  McCarl  1991). 
The  first  subissue  is  the  problem  of  expensive  chunks 
(Tambe,  Newell,  and  Rosenbloom.  1990;  Tambe  and 
Rosenbloom  1990).  in  which  individual  learned  rules 
are  so  expensive  to  use  that  the  system  suffers  a  slow 


down  from  learning.  The  second  subissue  is  the  a»- 
erage  growth  effect  (Doorenbos,  Tambe,  and  Newell 
1992;  Doorenbos  1993),  in  which  the  system  learns  so 
many  rules  —  none  of  which  individually  need  be  all 
that  expensive  —  that  a  slow  down  results. 

In  this  article  we  focus  on  expensive  chunks.  The 
prior  work  on  expensive  chunks  demonstrated  their 
existence  in  a  number  of  tasks,  identified  their  crigics 
in  the  exponential  (in  the  number  of  rule  conditions) 
upper-bound  on  the  cost  of  matching  individual  rules, 
and  investigated  a  range  of  possible  restrictions  on 
the  expressibility  of  the  rules  that  permit  polynomial 
upper-bounds  on  match  cost.  The  most  successful  of 
these  restrictions  is  unigue-attrxhutes,  in  which  match 
cost  is  bounded  by  a  linear  function  of  the  number 
of  conditions.  Imposition  of  the  unique-attributes  re¬ 
striction  disallows  object  attributes  from  having  more 
than  one  value.  Values  can  be  structured  objects  with 
many  parts,  but  they  cannot  be  unstructured  sets  of 
objects.  Figure  1-a  shows  an  unrestricted  encoding 
for  part  of  a  state  in  the  blocks  world.  The  attribute 
block  of  object  SI  is  not  a  unique  attribute  because 
it  has  three  distinct  values.  Figures  1-b  and  1-c  show 
two  different  unique-attribute  encodings  of  the  same 
structure. 


(SI  "typesuie) 
(SI  ^MockBI) 
(SI  'block  B2) 
(SI  'Mock  B3) 

(•) 


(SI  'type  siwe) 
(SI  'MockBD 
(Bl  'next  B2) 
(B2  'next  B3) 

(b) 


(SI  'typensK) 
(SI  'foauB2) 
(BI  'leftBD 
(BI  'Tifhl  B3) 

(t) 


Figure  1:  Unrestricted  (a)  and  unique-attribute  (b-c) 
encodings  in  the  blocks  world. 


Although  a  number  of  systems  have  been  successfully 
recoded  into  unique-attributes,  auid  reaped  significant 
time  savings  as  a  result,  there  are  still  some  outstand¬ 
ing  problems  with  it.  In  particular,  the  encoding  rad¬ 
ically  increases  the  number  of  rules  used  in  specifying 
some  tasks,  and  may  also  require  many  more  rules  to 
be  learned  to  achieve  the  same  level  of  coverage  (that 
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IS.  generality)  as  was  previously  attainable  by  a  small 
number  rules. 

In  this  article,  we  propose  an  alternative  diagnosis  for 
the  cause  of  e.xpensive  chunks,  along  with  a  new  ap¬ 
proach  for  eliminating  expensive  chunks  that  is  derived 
from  this  new  diagnosis.  The  core  idea  is  to  focus 
on  the  relationship  between  the  problem-space  search 
upon  which  the  learning  is  based  and  the  search  per¬ 
formed.  during  match,  by  the  rule  learned  from  this 
problem-space  search.  In  the  search  of  the  problem 
space,  some  path  —  that  is,  some  sequence  of  oper¬ 
ators  —  is  followed  that  eventually  leads  to  a  result 
The  actual  path  followed  usually  depends  on  meta¬ 
level  control  rules  that  determine  which  operators  are 
selected  for  which  states.  These  control  rules  should 
affect  only  the  efficiency  with  which  the  result  is  found, 
and  not  its  correctness.  As  a  result,  when  a  new  rule  is 
acquired  from  a  trace  of  this  problem  solving,  the  con¬ 
trol  rules  are  not  included  as  part  of  the  explanation 
of  the  result.  This  omission,  which  turns  out  to  also 
be  the  approach  taken  in  PRODIGY  (Minton  1993)', 
increaises  the  generality  of  the  learned  rules,  while  it 
should  not  affect  their  correctness."’ 

The  problem  with  this  approach,  however,  is  that  the 
learned  rules  are  not  now  constrained  by  the  path  ac¬ 
tually  taken  in  the  problem  space,  and  thus  can  per¬ 
form  an  exponential  amount  of  search  even  when  the 
original  problem-space  search  was  highly  directed  (by 
the  control  rules).  For  example,  with  suitable  con¬ 
trol  knowledge  in  the  Grid  Task  (Tambe,  Newell,  and 
Rosenbloom  1990)  it  is  possible  to  solve  the  problem 
of  finding  a  path  between  two  nodes  in  time  that  is 
linear  in  the  length  of  the  path.  However,  the  rule 
learned  from  this  search  may  be  so  general  that,  when 
it  matches,  it  searches  over  all  paths  of  that  length. 
This  rule  is  quite  general,  as  it  can  solve  any  problem 
that  has  a  solution  of  that  length:  however,  this  gen¬ 
erality  IS  only  obtained  at  an  enormous  cost  (i.e..  the 
cost  is  exponential  in  the  length  of  the  path). 

The  solution  suggested  by  this  diagnosis  is  to  incorpo¬ 
rate  traces  of  the  control  rules  utilized  in  the  problem- 
space  search  into  the  explanation  of  the  result.  This 
should  enable  the  match  process  for  learned  rules  to 
focus  on  just  the  precursors  for  the  path  that  was  ac¬ 
tually  followed,  and  thus  ensure  that  the  match  pro¬ 
cess  for  a  learned  rule  is  bounded  in  complexity  by  the 
problem-space  search  from  which  it  was  learned  Be¬ 
cause  the  match  process  runs  at  a  faster  rate  than  the 
problem  solving  process,  this  should  solve  the  expen- 


‘In  Prodigy,  selection  and  rejection  rules  are  included 
in  the  explanation,  but  preference  rules  are  not.  Likewise. 
Soar  currently  also  includes  require  and  prohibit  prefer¬ 
ences.  but  not  desirability  preferences. 

^In  Soar,  this  actually  can  at  times  affect  correctness, 
but  the  discussion  of  this  will  be  postponed  to  the  final 
section. 


sive  chunks  problem  by  ensuring  that  using  the  learned 
rule  takes  no  more  time  than  was  taken  by  the  original 
search . 

This  approach  is  closest  in  spirit  to  that  taken  in  (Shell 
and  Carbonell,  1991).  In  that  work,  iterative  paths 
found  during  problem-space  search  resulted  in  the  ad¬ 
dition  of  iterative  constructs  to  the  macro-operators 
acquired  from  the  search.  These  iterative  macro¬ 
operators  are  then  used  in  a  way  that  guarantees  that 
they  take  the  same  path  followed  in  the  problem  space 
Shell  and  Carbonell  claim  that  their  approach  solves 
the  expensive  chunks  problem  However,  it  doesn’t 
completely  because  not  all  expensive  chunks  arise  from 
iteration  Our  approach  captures  the  same  basic  intu¬ 
ition,  but  in  a  manner  that  it  is  both  more  general 
and  simpler.  It  is  more  general  because  it  captures 
the  factors  that  determined  the  entire  path,  rather 
than  just  the  iterative  portions,  and  thus  handles  all 
of  the  causes  of  expensive  chunks.  It  is  simpler  be¬ 
cause  it  does  not  require  an  enhanced  macro-language 
or  special  purpose  mechanisms  for  detecting  iteration. 
Instead,  it  simply  expands  by  a  small  amount  the  con¬ 
tent  of  the  explanation  used  during  learning. 

In  contrast  to  our  earlier  approaches  to  expensive 
chunks,  this  new  approach  imposes  no  exprtsaibtUty 
hmitations  on  the  encoding  of  tasks.  On  the  positive 
side,  this  means  that  the  problem  of  expensive  chunks 
can  be  solved  without  increasing  the  difficulty  of  task 
encoding.  On  the  negative  side,  this  means  that  no 
sub-exponential  bound  is  being  imposed  on  the  match 
process  —  if  the  original  rules  encoded  into  the  system 
require  exponential  matches,  then  so  may  the  learned 
rules.  We  have  thus  effectively  split  off  the  goal  of 
removing  expensive  chunks  from  the  related  goal  of 
guaranteeing  bounds  on  the  match,  and  in  the  process 
found  a  weaker  approach  that  solves  the  former  but 
.iOt  the  latter,  but  with  no  limit  on  task  expressibility. 

Despite  this  result,  this  new  approach  is  not  free  of 
problems.  One  significant  problem  is  that  it  doesn’t 
specify  what  to  do  when  decisions  in  a  search  su'e  based 
on  lack  of  knowledge.  In  such  circumstances,  the  learn¬ 
ing  process  has  no  explanation  for  why  a  choice  was 
made,  and  therefore  can  acquire  rules  that  are  just 
as  expensive  as  those  learned  by  the  unaltered  learn¬ 
ing  mechanism.  The  other  significant  problem  is  that, 
as  with  unique-attributes,  this  approach  can  lead  to 
learned  rules  that  Me  less  general  than  would  be  ac¬ 
quired  by  the  unaltered  learning  mechanism.  This 
comes  about  here,  not  because  of  limitations  on  the 
representation,  but  because  additional  conditions  are 
incorporated  into  leeirned  rules  based  on  control  rules 
that  are  now  part  of  the  explanation.  These  conditions 
provide  efficiency,  but  at  the  cost  of  eliminating  search 
that  otherwise  would  allow  the  rules  to  apply  in  more 
circumstances. 
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In  ihe  next  section  we  look  at  this  first,  lack-of- 
knowledge  problem  in  more  detail,  and  identify  two 
possible  solutions.  In  the  subsequent  section  we 
present  experimental  results  from  using  the  new  ap¬ 
proach  to  expensive  chunks  in  combination  with  one  of 
the  proposed  solutions  to  the  lack-of- knowledge  prob- 
leni.  in  particular,  a  solution  that  depends  on  a  novel 
restriction  on  the  expressiveness  of  the  resulting  sys¬ 
tem.  In  the  process  we  will  discuss  the  impact  of  the 
second,  specialization-of-learned  rules  problem.  The 
final  section  summarizes  and  discusses  issues  for  fu¬ 
ture  work. 

2  Decisions  Based  on  Lack  of 
Knowledge 

In  Soar,  a  real  lack  of  knowledge  —  as  reflected  in 
an  insufficient  set  of  preferences  about  a  decision  — 
leads  to  an  impasse  rather  than  to  a  decision.  Thus  it 
might  seem  that  Soar  wouldn't  suffer  from  this  prob¬ 
lem  However,  it  does  have  a  construct  —  an  indiffer¬ 
ent  preference  —  that  allows  the  explicit  statement  of 
indifference  among  a  set  of  choices.  The  decision  pro¬ 
cedure  is  then  free  to  select  randomly  among  the  in¬ 
different  choices.  The  resulting  choice  is  thus  made  in 
such  a  way  that  no  explanation  of  the  selection  among 
the  indifferent  alternatives  is  possible  based  just  on  the 
initial  situation. 


(sp  operat.ir-gcKo 

(goal  <g>  ‘'problem-space  <p> 
''sure  <s» 

(<p>  ''name  grid-task) 

(<s>  ''at  <locl» 

(<loc  1  >  '\:onnected  <loc2>) 

-> 

(<o>  ‘'name  goto-loc 

'at  <loc  1  >  'to  <loc2>) 
(<g>  'operator  <o>)) 


(•I  (1)) 

Figure  2:  The  Grid  Task. 

Consider  an  example  from  the  Grid  Task  —  a  problem 
known  to  lead  to  expensive  chunks  (Tambe,  Newell, 
and  Rosenbloom  1990)  —  shown  in  Figure  2-a.  The 
problem  is  to  go  from  point  F  to  point  P.  a  path  of 
length  four  Because  point  F  is  connected  to  four 
adjacent  points,  four  operators  are  suggested  by  rule 
nperator-goto  (Figure  2-b)^.  Since  the  knowledge  re¬ 
quired  to  choose  among  them  is  not  directly  available 
in  productions,  an  impasse  occurs  on  operator  selec¬ 
tion  In  the  subgoal  created  for  this  impasse.  Soar 

^Symbols  enclosed  in  angle  brackets  are  variables. 


Figure  3:  Problem  solving  in  the  Grid  Task. 


normally  employs  the  selection  problem  space,  which 
contains  evaluate  operators  that  can  be  applied  to  the 
competing  task  operators.  Once  generated,  these  eval¬ 
uations  will  be  turned  into  preferences  that  sJlow  one 
of  the  task  operators  to  be  selected.  However,  the  sys¬ 
tem  has  no  direct  knowledge  about  which  of  the  four 
operators  it  ought  to  evaluate  first,  so  without  further 
assistance  it  would  impasse  again,  and  possibly  con¬ 
tinue  this  recursive  subgoaling  indefinitely.  To  avoid 
this,  one  of  Soar’s  general  background  rules  generates 
indiffeient  preferences  for  the  set  of  evaluate  opera¬ 
tors.  This  lets  it  pick  one  at  random,  and  begin  to 
make  progress. 

If,  as  is  often  the  case,  the  information  about  how  to 
evaluate  an  operator  is  not  directly  available,  an  eval¬ 
uation  subgoal  (to  implement  the  evaluate  operator) 
is  created.  The  task  in  this  third-level  subgoal  is  to 
determine  the  utility  of  the  operator.  To  do  this,  it 
performs  a  bit  of  lookahead  search,  trying  out  the  task 
operator  (possibly  in  simulation)  on  the  original  task 
state.  If  the  resulting  state  can  be  evaluated,  then  the 
subgoal  terminates,  otherwise  the  process  continues, 
recurring  on  the  question  of  what  task  operator  to  ap¬ 
ply  to  this  new  state.  Figure  3  shows  this  search  pro¬ 
cess  in  the  Grid  Task  which  continues  until  the  point 
P  is  reached. 

In  this  overall  lookahead  search,  indifferent  preferences 
indirectly  determine  which  path  the  system  moves 
down,  by  directly  determining  which  of  the  operators 
are  evaluated  at  each  point.  However,  the  rules  learned 
from  this  search  can  gather  no  explanation  from  the 
indifferent  preferences  as  to  why  one  path  was  taken 
rather  than  another.  Figure  4  shows  such  a  learned 
rule.  This  rule  says  that  if  you  are  at  location  <11  > 
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(sp  chunk-example 
chunk 

(goal  <g>  ''problem-space  <p>  ''state  <s> 

''operator  <o>  -s  ''desired  <d» 

(<o>  ''name  goto-loc  ''at  <ll>  ''to  <I2>) 

(<p>  ''name  gnd-path) 

(<s>  ''at  <11>)  (<d>  ''at  <15>) 

(<12>  ''connected  <I3>)  (<I3>  ''connected  <14>) 

|<I4>  Connected  <15>) 

-> 

(<g>  'operator  <o>  >)) 

Figure  4:  An  expensive  chunk  learned  from  indifferent 
choices. 


and  want  to  get  to  location  <15>,  and  there  is  an  op¬ 
erator  that  takes  you  from  <11>  to  <12>,  and  there  is 
a  connected  path  from  <12>  to  <15>  (via  two  inter¬ 
mediate  points,  <13>  and  <14>),  then  the  operator 
is  the  best  choice.  This  rule  is  expensive  because  it 
may  need  to  search  an  exponential  number  of  paths  of 
length  four  to  find  one  that  has  this  property.  Even 
if  the  original  problem-space  search  happened  to  lo¬ 
cate  the  correct  path  by  accident  on  its  first  try,  or 
if  outside  guidance  wais  provided  to  lead  it  down  the 
correct  path,  the  resulting  rule  would  still  incorporate 
this  exponential  search. 

There  are  (at  least)  two  possible  ways  of  solving  this 
problem.  The  first  is  to  alter  the  learning  and  match 
processes  so  that  they  more  appropriately  reflect  the 
semantics  of  indifferent  preferences.  Use  of  an  indif¬ 
ferent  preference  means  that  a  random  selection  of  a 
single  path  should  be  made.  However,  the  match  al¬ 
gorithm  always  follows  all  paths.  So,  reflecting  the 
semantics  of  indifference  should  involve  altering  the 
learning  and  match  processes  so  that  use  of  indiffer¬ 
ent  preferences  during  problem-space  search  yields  the 
random  choice  of  a  single  alternative  during  the  corre¬ 
sponding  part  of  the  match  of  the  learned  rule.  If,  in 
fact,  the  indifferent  preference  meant  that  the  system 
rcaily  didn't  care  which  of  the  paths  was  taken,  then 
any  random  selection  made  by  the  matcher  should  be 
as  good  as  any  other.  If.  however,  the  indifferent  pref¬ 
erence  actually  signified  lack  of  knowledge  about  the 
correct  path,  and  not  all  paths  actually  do  lead  to  suc¬ 
cess,  then  the  match  will  follow  one  path  randomly, 
and  thus  will  succeed  only  stochastically 

This  first  direction  looks  pretty  interesting.  It  solves 
the  problem  without  introducing  an  expressibility  lim¬ 
itation.  while  at  the  same  time  introducing  a  stochas- 
ticity  into  the  use  of  learned  rules,  and  a  resulting 
gradualness  in  performance  improvement  that  may  be 
quite  useful  in  modeling  human  cognition  However. 
It  requires  a  significant  enough  alteration  in  the  basic 
architecture  of  Soar,  that  we  have  decided  to  first  in¬ 
vestigate  a  simpler  alternative,  and  leave  this  one  for 
future  work. 

The  second  way  of  solving  the  problem,  and  the  one 
underlying  the  results  reported  here,  is  to  disallow  the 


use  of  indifferent  preferences.  Their  ability  to  select 
randomly  among  alternatives  is  then  replaced  by  ex¬ 
plicit  default  orderings  on  the  alternatives.  If  there  are 
any  substantive  reaisons  why  one  alternative  should  be 
selected  ahead  of  another,  they  can  be  incorporated 
into  this  ordering.  To  the  extent  that  there  are  no  sub¬ 
stantive  reasons,  an  arbitrary  ordering  can  be  imposed. 
The  key,  though,  is  that  these  orderings  are  generated 
explicitly  by  rules  that  distinguish  among  the  alterna¬ 
tives.  and  therefore  leave  behind  a  trace  that  can  be 
used  in  explaining  why  one  alternative  is  picked  over 
the  others.  This  may  not  provide  a  “good”  explama- 
tion,  in  the  sense  of  capturing  a  suitable  level  of  gener¬ 
ality  to  support  tramsfer  to  related  situations;  however, 
it  will  at  least  be  sufficient  to  distinguish  the  one  se¬ 
lected  alternative  from  the  others  during  the  match, 
and  thus  to  make  the  resulting  learned  rules  cheap. 

For  the  Grid  Task,  an  arbitrary  ordering  of  the  op¬ 
erators  can  be  assigned  according  to  the  direction  of 
movement.  For  example,  first  up,  then  down,  then 
left,  and  finally  nght.  It  is  important  to  note  that  this 
ordering  is  just  used  in  place  of  the  indifferent  prefer¬ 
ences  on  the  evaluate  operators  in  the  selection  spsMie. 
Thus  it  determines  the  order  in  which  the  operators  are 
evaluated,  but  does  not  dictate  an  ordering  on  the  task 
operators.  This  latter  ordering  is  still  to  be  learned,  as 
a  new  set  of  control  rules,  from  the  lookahead  search. 

The  elimination  of  indifferent  preferences  amounts  to 
a  limitation  on  the  system's  expressibility,  though  of  a 
form  quite  different  from  those  previously  investigated. 
It  also  clearly  may  impact  the  generality  of  the  result¬ 
ing  rules,  at  least  to  the  extent  that  arbitrary  orderings 
are  imposed.  As  such,  it  needs  to  be  evaluated,  just  as 
was  the  unique-attributes  restriction,  in  terms  of  the 
trade-offs  it  provides  among  expressibility,  speed,  and 
generality. 

3  Experimental  Results 

In  this  section  we  look  at  how  well  the  incorporation  of 
search  control  into  learned  rules,  in  combination  with 
the  elimination  of  indifferent  preferences,  compau'es 
with  both  an  unaltered  version  of  Soar  and  a  unique- 
attributes  version.  The  results  are  all  from  Soar6  (ver¬ 
sion  6.0.3),  the  latest  C-based  release  of  Soar  (Dooren- 
bos  1992),  which  is  approximately  10-40  times  faster 
than  SoarS  (the  previous  Lisp-based  release).  The  ex¬ 
perimental  version  is  just  like  the  standard  system, 
except  that  the  explanations  upon  which  new  rules 
are  based  incorporate  traces  of  the  control  rules  that 
determined  the  choices  made  in  problem  solving.  In 
particular,  the  system  computes  the  minimum  set  of 
preferences  sufficient  to  determine  each  choice  that  was 
made,  so  that  if  the  set  of  preferences  overdetermines 
the  choices,  the  redundant  preferences  (and  their  rule 
traces)  are  pruned  from  the  explanation  to  make  the 
created  rule  as  general  as  possible. 
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Before  learning 

After  Learning 

Onginatl 

5  38 

24  79 

Search  control 

6  83 

1  18 

L'nique-attribuie 

6  82 

0  95 

Table  1:  Average  CPL'  time  in  the  Grid  Task. 


( sp  chunk-search  control 
chunk 

( goal  <g>  "operator  <r>  "operator  <u> 
"operator  <d>  ♦  "desired  <d  1  >  "state  <s> 
(<r>  "pnonty  4  "at  <ll>  "to  <12>) 

(<u>  "prionty  3)  f<d>  "pnority  I) 

(<s>  "at  <ll>)  (<dl>  "at  <18>) 

(<12>  "nght  <I3>  "connected  <J3> 

"down  <I4>  "connected  <I4> 

"up  <15>  "connected  <I5>) 

(<13>  "up  <16>  "connected  <I6> 

"down  <17>  "connected  <17>) 

(<16>  "connected  <1S>  "up  <18>) 

— > 

(<gi  "operator  <r>  ») 
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(sp  chunk-unique-attnbute 
chunk 

(goal  <g>  "operator-nght  <r>  * 

"desired  <d>  "state  <s> 

(<r>  "at  <ll>  "to  <I2>) 

(<s>  "at  <11>)  (<d>  "at  <I5>) 

(<12>  "nght  <13» 

(<13>  "up  <14» 

(<14>  "up  <JJ>) 

--> 

i<g>  "operator  <r>  >)) 

(b) 

Figure  5:  Chunks  from  search-control  and  unique- 
attribute  versions. 


Table  1  shows  the  average  CPU  time  per  problem  (in 
seconds)  for  the  three  versions,  across  seven  different 
problems  in  the  Grid  Task,  both  before  and  after  learn¬ 
ing.  All  of  the  grid  problems  used  here  are  searches 
for  paths  of  length  six;  for  example,  in  Figure  2-a,  a 
problem  to  go  from  point  A  to  point  P  is  a  length- 
six  problem.  For  experimental  efficiency,  the  results 
shown  here  also  assume  a  10x10  bounded  grid  instead 
of  the  unbounded  grid  in  Figure  2-a.  The  first  row 
in  Table  1  shows  the  times  before  and  after  learning 
for  the  unaltered  version  of  Soar.  Without  including 
search  control  in  chunking,  or  restricting  the  task  rep¬ 
resentation.  the  time  after  learning  is  greater  than  the 
time  before  learning  for  all  these  problems  (by  an  av¬ 
erage  factor  of  4.61),  and  for  one  of  the  problems  it  is 
more  than  a  factor  of  nine  greater  This  is  true  even 
though  the  number  of  problem  solving  steps  (i  e  ,  de¬ 
cisions)  IS  decreased  via  learning  from  133  to  8,  This 
extra  cost  is  directly  attributable  to  the  large  amount 
of  time  spent  matching  expensive  chunks. 


4- :  otiginal  Soar6  1 

•  search  control  j 

c  unique-attribute  \ 


Gnd  nobJems 

Figure  6:  Number  of  accumulated  chunks  in  the  Grid 
Task. 


The  second  and  third  rows  in  Table  1  show  the  corre¬ 
sponding  CPU  times  for  the  search-control  and  unique- 
attributes  versions  of  the  Grid  Task.^  Both  show  more 
than  a  factor  of  five  reduction  in  execution  time  after 
learning.  In  each  problem,  they  show  essentially  the 
same  pattern:  the  time  after  learning  is  a  small  con¬ 
stant  value  that  is  uniformly  less  than  the  time  before 
learning.  This  implies  that  both  have  solved  the  ex¬ 
pensive  chunks  problem  for  this  task. 

The  extra  time  before  learning  in  the  search-control 
and  unique-attribute  versions  stems  from  the  increase 
in  tokens  brought  about  the  additional  rule  conditions 
that  discriminate  among  moving  directions,  as  shown 
in  the  conditions  of  the  chunks  in  Figure  5.  These  two 
chunks  correspond  to  the  expensive  chunk  in  Figure  4. 
The  difference  in  run  times  after  learning  between  the 
search-control  version  and  the  unique-attribute  version 
in  Table  1  is  also  due  to  the  extra  conditions  in  the 
search-control-version  chunks.  However,  this  yields 
only  a  minor  effect,  as  analyzed  in  (Tambe  1991). 

Figure  6  shows  the  cumulative  number  of  chunks  ac¬ 
quired  while  solving  the  eight  Grid-Task  problems. 
The  unmodified  version  of  Soar  learned  general  enough 
chunks  from  the  first  problem  to  cover  all  of  the  other 
length-six  problems  The  other  two  approsuihes  needed 
to  learn  additional  chunks  for  each  new  problem.  In 
these  problems,  both  learned  the  same  number  of  rules 
with  the  same  generality.  Although  there  are  addi¬ 
tional  contraints  induced  by  the  extra  conditions  in 
Figure  5-(a).  both  chunks  in  Figure  5  have  the  same 
generality  in  that  they  describe  the  same  grid  path 
followed  by  the  lookahead  search  to  reach  the  desired 
point,  and  nothing  more  than  that. 


’The  unique-allribute  representation  replaces  the 
multi-attribute  ''connected  with  four  distinct  attributes 
'‘up,  ''down,  ''left  and  '‘nght. 
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Eight-puz2le  Task 

1  Average  CPU  Time  (sec) 

Before  Learning 

After  Learning 

Original 

5  31 

9  75 

Search  control 

5  30 

1  55 

I’niq  Lie- attribute 

5  72 

1  21 

Table  2:  Average  CPI'  time  in  the  Eight-puzzle  Task. 


Table  2  compares  these  three  methods  on  the  Eight- 
puzzle  Task  —  another  task  known  to  produce  expen¬ 
sive  chunks  (Tambe,  Newell  and  Rosenbloom  1990). 
In  the  multi-attribute  representation,  a  state  points 
to  nine  bindings  (using  attribute  '‘binding),  each  of 
which  connects  a  cell  from  the  static  3x3  structure  of 
the  board  to  a  tile.  For  example,  in  (B1  '‘cell  Cl)  (B1 
^tile  Tl),  binding  B1  connects  cell  Cl  to  tile  TI.  A 
cell  points  to  all  of  its  neighboring  cells.  For  example, 
(Cl  ''next  C2),(Cl  ''next  C3).  and  so  on. 

The  search-control  version  used  here  distinguishes  op¬ 
erators  by  the  direction  that  they  move  the  blank  cell: 
down,  up,  left  and  right.  The  unique-attribute  rep¬ 
resentation  removes  the  multi-attribute  ''binding  by- 

numbering  it,  ''  bindingl,  '' bindingS . ''  bindingQ, 

and  replaces  '' nezi  with  4  attributes  ''down,  ''up. 
''left  and  ''right.  As  shown  in  Table  2.  the  time  af¬ 
ter  learning  is  less  than  the  time  before  learning  in 
the  search-control  and  unique-attribute  versions,  while 
the  original  Soar  requires  more  CPU  time  after  learn¬ 
ing  .\s  in  the  Grid  Task,  both  the  search-control  and 
unique-attribute  versions  have  eliminated  the  expen¬ 
sive  chunks  that  occur  in  the  unmodified  version.  The 
difference  in  run  times  between  the  search-control  ver¬ 
sion  and  the  unique-attribute  version  stems  from  the 
same  reason  as  in  the  Grid  Task 

The  number  of  rules  used  to  encode  the  Eight  Puzzle  in 
the  three  versions  tells  an  interesting  story.  The  orig¬ 
inal  version  of  Soar  uses  13  rules,  the  search-control 
version  uses  16  rules,  and  the  unique-attributes  ver¬ 
sion  uses  93  rules.  The  small  growth  in  going  from 
Soar  to  the  search-control  version  stems  from  the  need 
to  differentiate  and  provide  a  default  order  on  the  do¬ 
main  r)pera*ors  If  tfier"  ’•■e  n  possible  operators,  n-1 
additional  rules  are  required.  The  large  growth  in  the 
unique-attributes  version  stems  from  the  need  to  cre¬ 
ate  rules  for  each  speciedization  of  a  generic  attribute 
into  a  more  specialized  unique-attribute  In  general,  if 
there  are  n  tests  of  multi-attributes  in  a  rule,  each  with 
m  possible  specializations.  th"n  the  unique-attributes 
version  will  need  to  substitute  m"  specialized  rules  ' 

Figure  7  shows  the  cumulative  number  of  chunks 
learned  while  solving  the  nine  Eight-puzzle  problems. 


'There  are  ways  to  reduce  this  number  by  splitting  these 
tests  across  a  sequence  of  rules,  but  that  approach  aJso  has 
Its  own  problems 


Figure  7:  .Number  of  accumulated  chunks  in  the  Eight- 
puzzle  Task. 

The  search-control  version  required  32  new  rules,  as 
compared  to  109  new  rules  for  the  unique-attribute 
version,  and  22  new  rules  for  the  original  version. 
Thus,  for  this  task,  the  search-control  version  is  quite 
close  to  the  original  Soar  version,  and  both  show  a 
distinct  advantage  over  unique-attributes. 

Unique-attributes'  need  here  for  many  new  rules  stems 
from  the  same  source  as  its  large  number  of  initial 
rules,  plus  the  following  fact.  The  search-control  ver¬ 
sion  need  not  distinguish  between  values  of  a  multi¬ 
attribute  as  long  as  it  doesn't  affect  the  decision  that  is 
based  on  lack  of  knowldge,  while  the  unique-attribute 
version  must  replace  the  muti-attribute  anyhow.  At¬ 
tribute  ''  binding  in  the  above  Eight-puzzle  task  is  an 
example  of  a  multi-attribute  that  doesn’t  affect  the 
decision  based  on  lack  of  knowledge. 


Figure  8  Number  of  chunks  in  different  Eight-puzzle 
representation. 

The  Eight  Puzzle  can  also  be  expressed  via  a  differ¬ 
ent  set  of  rules,  without  the  multi-attribute  ''binding. 
Although  there  is  considerable  reduction  in  the  num¬ 
ber  of  rules.  Figure  8  shows  that  the  unique-attribute 
version  still  needs  more  rules  because  of  the  former 
effect 
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4  Summary  and  Discussion 

I'nique-attributes  solve  the  expensive  chunks  prob¬ 
lem  by  restricting  the  expressiveness  of  rules  down  to 
where  the  match  can  be  guaranteed  to  run  in  polyno¬ 
mial  (in  particular,  linear)  time  This  provides  strong 
assurances  about  system  performance,  but  also  neg¬ 
atively  impacts  task  creation  and  learned-rule  gener¬ 
ality  Here  we  have  proposed  and  investigated  a  new 
approach  —  based  on  including  search-control  in  the 
explanations  upon  which  new  rules  are  based  —  that 
solves  the  expensive  chunks  problem,  but  not  by  en¬ 
forcing  a  fixed  computational  bound  on  the  match  pro¬ 
cess.  Instead,  the  complexity  of  the  match  of  a  learned 
rule  IS  bounded  by  the  complexity  of  the  search  from 
which  it  was  learned.  This  gives  up  an  overall  guar¬ 
antee  on  system  performance,  but  given  an  initially 
encoded  system,  learning  will  not  make  it  worse  In 
exchange  for  this  weakening  of  the  guarantee,  this  new 
approach  shows  potential  for  ameliorating  both  of  the 
negative  side  effects  introduced  by  unique-attributes. 

One  additional  positive  side-effect  of  the  search-control 
approach  is  that  it  removes  one  possible  source  of  over- 
generalization  in  Soar  (Laird,  Rosenbloom,  and  Newell 
1986)  Though  search  control  is  not  supposed  to  affect 
the  correctness  of  results  generated  in  problem  spaces. 
It  sometimes  unavoidably  does.  In  situations  in  which 
results  are  returned  from  a  problem  space  before  the 
goal  test  succeeds,  or  where  the  goal  test  is  itself  over¬ 
general.  search  control  may  play  an  influential  role  in 
determining  the  correctness  of  the  result.  Lnder  such 
circumstances,  the  current  approach  —  not  including 
this  search  control  in  the  explanation  process  —  can 
yield  overgeneral  learned  rules  However,  by  including 
this  search  control  into  the  explanation  of  the  result, 
the  proposed  approach  removes  this  potential  source 
of  overgenerality, 

A  possible  negative  side-effect  of  the  search-control  ap¬ 
proach  IS  that  It  increases  the  difficulty  of  directing  the 
reconstruction  process  that  underlies  Soar's  approach 
to  knowledge-level  learning  (Rosenbloom,  Laird,  and 
Newell  1987;  Rosenbloom  and  Aasman  1990)  There 
we  took  advantage  of  search  control's  absence  from 
explanations  in  learning  a  rule  whose  actions  mirrored 
some  perceived  object  structure,  but  whose  conditions 
did  not  lest  the  per'-eived  object  With  this  option  no 
longer  available,  a  new  approach  must  be  employed 
One  possibility  that  was  actually  already  under  inves¬ 
tigation  independently  of  this  work,  is  a  form  of  situ¬ 
ated  reconstruction,  in  which  reconstruction  is  guided 
by  features  of  the  immediate  situation  other  than  those 
to  be  reconstructed  (Vera.  Lewis  and  Lerch  1993). 

In  addition  to  investigating  options  for  knowledge-level 
learning,  several  other  issues  need  near-term  attention 
At  the  top  of  the  list  is  extending  the  experimental  re¬ 
sults  to  a  wider  range  of  tasks  —  both  those  that  tradi¬ 
tionally  yield  expensive  chunks  and  those  that  don’t  — 


and  to  quantitative  analyses  of  speed  ups  and  (losses 
of)  generality  Also  useful  would  be  a  theoretical  anal¬ 
ysis  of  the  method,  and  of  its  potential  to  avoid  (or 
lead  to)  slow  downs  with  learning  There  is  also  a  sub¬ 
tle  issue  that  needs  to  be  addressed  that  only  occurs 
when  there  are  more  options  available  at  performance 
than  at  learning  time:  in  particular,  if  the  conditions 
learned  to  discriminate  among  the  options  available  at 
learning  time  are  not  sufficient  to  discriminate  among 
these  new  options,  additional  match  search  may  be  in¬ 
troduced  Finally,  altering  the  architecture  so  as  to 
permit  the  appropriate  use  of  indifferent  preferences 
would  enable  the  removal  of  the  one  expressibility  lim¬ 
itation  that  it  was  found  necessary  to  impose 
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